Name: Ke Chen, Abhinav Bannerjee

UT EID: kc35827, ab45393

General Steps

  1. Find the .csv data
  2. create the repository in Github. 3 create project with four directories.
  3. Using R_etl.r to reformat our datas and import data into ORACL SQL.
  4. Query datas from server, and create Tableau workbook.
  5. make three different grpah by using graph.
  6. make explanation on each plot.
  7. Knit R, combine all file into a html file.

New things add,

  1. We create a new dataset which can be blended by old data set.
  2. We create new barchart in Tableau after blend.
  3. We reproduce plots which we created last time.

R_ETL.R file operation

The R_ETL file generates a code for us to create a table in Oracle SQL.

## CREATE TABLE 2016_FE_Guide_for_DOE (
## -- Change table_name to the table name you want.
##  Mfr_Name varchar2(4000),
##  Division varchar2(4000),
##  Carline varchar2(4000),
##  Verify_Mfr_Cd varchar2(4000),
##  Transmission varchar2(4000),
##  Guzzler_ varchar2(4000),
##  Air_Aspiration_Method_Desc varchar2(4000),
##  Trans_Desc varchar2(4000),
##  Lockup_Torque_Converter varchar2(4000),
##  Trans_Creeper_Gear varchar2(4000),
##  Drive_Desc varchar2(4000),
##  Fuel_Usage_Desc varchar2(4000),
##  Carline_Class_Desc varchar2(4000),
##  Release_Date varchar2(4000),
##  Var_Valve_Timing_ varchar2(4000),
##  Var_Valve_Lift_ varchar2(4000),
##  Energy_Storage_Device_Desc varchar2(4000),
##  Battery_Type_Desc varchar2(4000),
##  Batt_Charger_Type_Desc varchar2(4000),
##  Regen_Braking_Type_Desc varchar2(4000),
##  Regen_Braking_Wheels_Source varchar2(4000),
##  Driver_Cntrl_Regen_Braking_ varchar2(4000),
##  Motor_Gen_Type_Desc varchar2(4000),
##  Fuel_Metering_Sys_Desc varchar2(4000),
##  Fuel_Cell_Vehicle varchar2(4000),
##  Off_Board_Charge_Capable varchar2(4000),
##  Camless_Valvetrain varchar2(4000),
##  Oil_Viscosity varchar2(4000),
##  Stop_Start_Engine varchar2(4000),
##  Model_Year number(38,4),
##  Eng_Displ number(38,4),
##  Numbers_of_Cyl number(38,4),
##  City_FE_Guide number(38,4),
##  Hwy_FE_Guide number(38,4),
##  Comb_FE_Guide number(38,4),
##  Numbers_of_Gear number(38,4),
##  Max_Ethanol_Gasoline number(38,4),
##  Annual_Fuel_Cost number(38,4),
##  Calculated_Annual_Fuel_Cost number(38,4),
##  EPA_FE_Label_Dataset_ID number(38,4),
##  Numbers_of_Batteries number(38,4),
##  Total_Voltage_for_Battery number(38,4),
##  Batt_Energy_Cpacity_Amp_hrs number(38,4),
##  Batt_Specific_Energy_Watt number(38,4),
##  Numbers_of_Drive_Motor_Gen number(38,4),
##  Rated_Motor_Gen_Power_kW number(38,4),
##  Calculated_Gas_Guzzler_MPG number(38,4),
##  FE_Rating_1_10 number(38,4),
##  five_years_savings number(38,4),
##  five_years_spend number(38,4),
##  City_CO2_Rounded_Adjusted number(38,4),
##  Hwy_CO2_Rounded_Adjusted number(38,4),
##  Comb_CO2_Rounded_Adjusted number(38,4)
##  );

Blended Data

We create a dataset by ourselves. By adding countries from for each division.

visualization

Scatter Plot

This visualization shows the most efficient vehicles based on mileage and annual fuel costs incurred on the driver. Naturally, economy class cars tend to be more efficient and cost effective(cost less).

Here is the plot produced by Rstudio.

## 
## > ggplot() + scale_colour_gradient(name = "Comb Fe Guide", 
## +     low = "green", high = "green4") + coord_cartesian() + scale_x_continuous() + 
## +      .... [TRUNCATED]

Crosstab

This visualization displays the most efficient cars and segments car models based on transmission and drive type. Front 2-wheel driven cars seem to be most efficient, regardless transmission type.

Here is the plot produced by Rstudio.

## 
## > KPI_Low = 20
## 
## > KPI_Medium = 31
## 
## > a <- df %>% select(TRANS_DESC, DRIVE_DESC, CARLINE, 
## +     COMB_FE_GUIDE, FE_RATING_1_10) %>% group_by(TRANS_DESC, DRIVE_DESC) %>% 
## +     summarise( .... [TRUNCATED] 
## 
## > a <- na.omit(a)
## 
## > a <- a %>% mutate(KPI_COMB_FUEL = ifelse(as.numeric(as.character(comb_fe)) < 
## +     KPI_Low, "Less than 20", ifelse(as.numeric(as.character(comb_fe) .... [TRUNCATED] 
## 
## > ggplot() + coord_cartesian() + scale_x_discrete() + 
## +     scale_y_discrete() + scale_x_discrete(labels = c("Automated Manual", 
## +     "Automated Ma ..." ... [TRUNCATED]

Bar Chart

This visualization breaks down cars by manufacturer and make. It then proceeds to aggregate fuel efficiency ratings for each car (and distinguishes between transmission types). Sports vehicles are clearly below the median mileage here. The reference line stands for the median Combinantion Fuel. And we can see clearly which carline are below the median, and which are above the median.

Here is the plot produced by Rstudio. Because we have too much carlines, and Rstudio cannot show them in one image, we tried to remove the carline, and keep only division and transmission type in the plot.

## 
## > COMB_FE_ <- df %>% select(COMB_FE_GUIDE)
## 
## > COMB_FE_[COMB_FE_ == "null"] <- NA
## 
## > df$TRANS_DESC[df$TRANS_DESC == "null"] <- NA
## 
## > df <- na.omit(df)
## 
## > COMB_FE_ <- na.omit(COMB_FE_)
## 
## > COMB_FE_$COMB_FE_GUIDE <- as.numeric(as.character(COMB_FE_$COMB_FE_GUIDE))
## 
## > comb_fe <- colMeans(COMB_FE_)
## 
## > ggplot() + coord_cartesian() + scale_x_discrete() + 
## +     scale_y_discrete() + facet_wrap(~DIVISION, ncol = 6) + labs(title = "Fuel Economy per Div ..." ... [TRUNCATED]

New Bar Chart

We put country and division in x axis, and median comb fe guide in y axis. We put fe rating as color. We can see which country produce more efficient cars.